Skip to main content

Concurrency & Parallelism in Backend Systems

Why Backend Systems Need Concurrency

Every backend system must handle multiple requests simultaneously
If a server handles only one request at a time:
- Other users must wait
- Leads to poor performance or failures
Concurrency helps:
- Utilize system resources efficiently
- Handle thousands of users concurrently

Typical Request Lifecycle

User → Server → Database → Response
Key observation:
- Server spends significant time waiting for external systems (DB, APIs)

Network Latency Examples

Local DB: ~1–2 ms
Same region: ~20–30 ms
Different region: ~90–100 ms

The Core Problem: Idle CPU

While waiting for DB response:
- CPU does nothing
Modern CPU capability:
- ~3 billion instructions/sec (~3 million per ms)
Example:
- 100 ms wait → 300 million instructions wasted

IO vs CPU Work

IO-Bound Work

Waiting for:
- Database
- External APIs
- File system
Takes ~70–95% of time in backend systems

CPU-Bound Work

Actual computation:
- Validation
- JSON parsing
- Encryption
- Image processing

Key Insight

Typical API call:
- ~250 ms IO waiting
- ~10 ms CPU work
Result:
- 95% resource underutilization without concurrency

What is Concurrency?

Ability to handle multiple tasks at once (logically)
CPU switches between tasks:
- Start → Pause → Resume

Key Idea

While one task waits (IO):
- CPU works on another task

What is Parallelism?

Ability to execute multiple tasks simultaneously (physically)

Requirement

Multiple CPU cores

Concurrency vs Parallelism

Concurrency

Single CPU core
Tasks interleave execution
Improves resource utilization

Parallelism

Multiple CPU cores
Tasks run at same time
Improves execution speed

Simple Analogy

Concurrency:
- One chef cooking multiple dishes (switching tasks)
Parallelism:
- Multiple chefs cooking simultaneously

Timeline Understanding (Conceptual)

Request A starts → uses CPU → waits (DB)
CPU switches to Request B
When A’s response returns:
- CPU resumes A later

Key Point

At any moment:
- Only one task runs (single core)
- But multiple tasks are in progress

Why This Matters

Backend systems are mostly IO-bound
Without concurrency:
- CPU stays idle most of the time
With concurrency:
- CPU is always utilized

When to Use What

Use Concurrency (Most Cases)

IO-heavy workloads:
- DB queries
- API calls
- File operations

Use Parallelism

CPU-heavy workloads:
- Image processing
- Encryption
- Video encoding

Real-World Backend Behavior

Server handles:
- HTTP requests
- Logging
- Background jobs
- Telemetry
All compete for CPU time
Concurrency ensures:
- Efficient scheduling across all tasks

How Concurrency is Implemented

Two main mechanisms:

1. Threads

OS-level execution units
Each thread:
- Has its own stack
- Has instruction pointer
Managed by OS scheduler

Thread Scheduling

OS assigns time slices (e.g., 2 ms)
After time slice:
- Thread pauses
- Another thread runs

Preemptive Scheduling

Threads are stopped automatically by OS
Ensures fairness across tasks

Blocking Behavior

When thread hits IO:
- Marked as blocked
OS switches to another thread
Once IO completes:
- Thread becomes runnable again

Memory Model of Threads

Within Same Process

Threads share:
- Heap memory
- Global variables

Between Processes

No shared memory (isolated)

Communication Between Threads

Done via shared memory
Advantages:
- Fast (no serialization)
Risks:
- Race conditions
- Data corruption

Parallelism with Threads

If multiple CPU cores:
- Multiple threads run truly in parallel
Improves:
- CPU-bound performance

Cost of Threads

1. Memory Overhead

Each thread:
- Stack ~KBs to MBs
Example:
- 10,000 threads → several GB memory

2. Creation Overhead

Creating thread involves:
- System call
- Stack allocation
- Scheduler registration
Takes:
- Microseconds to milliseconds

Key Takeaways

1. Backend Bottleneck

Mostly IO-bound, not CPU-bound

2. Concurrency is Essential

Prevents CPU idle time
Enables handling many users

3. Parallelism is Situational

Useful for heavy computation tasks

4. Threads are Powerful but Expensive

High memory + creation cost
Need careful management

Mental Model to Remember

CPU is valuable → never keep it idle
While waiting → do other work
Structure program to:
- Pause IO tasks
- Resume later
- Keep CPU busy

If you want next, I can:

Explain event loop (Node.js style) vs threads (very important for interviews)
Or give code-level intuition (C++ / Go / Node) so it clicks practically

Why Backend Systems Need Concurrency
Typical Request Lifecycle
- Network Latency Examples
The Core Problem: Idle CPU
IO vs CPU Work
- IO-Bound Work
- CPU-Bound Work
Key Insight
What is Concurrency?
- Key Idea
What is Parallelism?
- Requirement
Concurrency vs Parallelism
- Concurrency
- Parallelism
Simple Analogy
Timeline Understanding (Conceptual)
- Key Point
Why This Matters
When to Use What
- Use Concurrency (Most Cases)
- Use Parallelism
Real-World Backend Behavior
How Concurrency is Implemented
- 1. Threads
- Thread Scheduling
- Preemptive Scheduling
- Blocking Behavior
Memory Model of Threads
- Within Same Process
- Between Processes
Communication Between Threads
Parallelism with Threads
Cost of Threads
- 1. Memory Overhead
- 2. Creation Overhead
Key Takeaways
- 1. Backend Bottleneck
- 2. Concurrency is Essential
- 3. Parallelism is Situational
- 4. Threads are Powerful but Expensive
Mental Model to Remember